The accuracy of geostatistics for regional geomagnetic modeling in an archipelago setting

Indonesia as an archipelago country relies on a limited number and clustered distributed repeat station networks. This paper explores the use of geostatistical modeling to overcome this data limitation. The model data set consisted of repeat station data from 1985 to 2015 epoch. The geostatistical methods utilized included ordinary kriging (OK), collocated cokriging (CC), and kriging with external drift (KED). The model generated using these geostatistical methods was then compared to spherical cap harmonic analyses (SCHA) and polynomial models. The geostatistical model was shown to perform better, with greater accuracy in declination, inclination, and total intensity, as indicated by the root mean square error (RMSE). We have demonstrated that the geostatistical method is a promising approach in the modeling of regional geomagnetic field, especially in areas with limited and clustered distributed data.

www.nature.com/scientificreports/ Comparing IGRF and regional geomagnetic data in China shows that the difference in total intensity can reach 143 nT 12 . For research and exploration, geomagnetic data require an accuracy of 0.1° in the declination component and 50 nT in the total intensity 13 . Furthermore, the spherical harmonic analysis equation cannot be used for a geomagnetic field in the form of an orthogonal area 14 . This has led to the introduction of new regional geomagnetic field models in the hope of producing a more accurate model. Based on IAGA Resolution No. 23 (1963), "Repeat station measurements during the International Years of the Quiet Sun (IQSY) in support of the World Magnetic Survey (WMS), " the Badan Meteorologi, Klimatologi dan Geofisika (BMKG), or the Meteorological, Climatological, and Geophysical Agency of Indonesia have conducted measurements in repeat stations to cover the region of Indonesia starting in 1985. These measurements were routinely carried out every five years. The locations of the Indonesian repeat stations comply with some of the criteria set forth by the IAGA 15 . The cluster distribution of the repeat stations of the Indonesian archipelago is characterized by nearest-neighbor analysis with a negative value (Z score = −0.78). Repeat station distribution is difficult to make uniform 16 . The repeat station locations cannot be placed on the water, as those sites cannot be accessed. No locations were found that met the standards for repeat stations 17 . This situation problematizes the regional geomagnetic field modeling framework in Indonesia (Fig. 1).
The methods generally used in regional geomagnetic field modeling include polynomial 18,19 , spline, Taylor expansion 20,21 , spherical cap harmonic analysis (SCHA) 22 , revised spherical cap harmonic analysis 23 , and revised spherical cap harmonic analysis for the Earth's surface 24 . These methods typically require dense data for modeling. The boundary effects of the existing methods are also affected by the data configuration. Numerical instability due to wide data gaps resulting from sparse data patterns needs to be balanced with statistical or physical regulation to increase model reliability 25 .
We propose the geostatistical method as an alternative for regional geomagnetic field modeling using limited and clustered data patterns. A geostatistical method predicts the value of data property which is not covered by the sample data distribution or a sparse data pattern 26 . Originally developed to predict the distribution percentages of ores for mining 27 , geostatistical methods also very accurately model natural resources 28 such as underground rivers 29 and reservoirs of oil and gas 30,31 .
Geostatistical methods are classified into two types of algorithms: traditional and geostatistical 32 . The weighted values in the traditional algorithm are based on the geometrical distance of the surrounding estimator data. This includes inverse distance weighting (IDW), nearest neighbor or polygon, and triangulation methods. Geostatistical algorithms, on the other hand, use the structure or statistics of distances from the surrounding estimator data to weigh the data property. The geostatistical kriging algorithm evaluates the distance and direction of the array of sample data points to reflect spatial correlations. Kriging can produce good estimates for anisotropic data and clustered distributions 33 . There are many kinds of kriging algorithms: e.g., simple kriging, ordinary kriging (OK) 34,35 , universal kriging, kriging with external drift (KED) 36 , and collocated cokriging (CC) 37 . www.nature.com/scientificreports/ However, geostatistical methods are rarely used for regional geomagnetic field modeling. A search on Scopus for indexed papers using the keywords "geomagnetic" and "regional" resulted in only 3 of 3524 papers referencing geomagnetic regional modeling (Fig. 2). No paper discusses the geostatistical method for geomagnetic regional modeling with special emphasis on clustered distributed data. Accordingly, the purpose of this paper is to explore the accuracy of the geostatistical method in regional geomagnetic field modeling and mapping.

Results
The geostatistical method generally yields a smaller root mean square error (RMSE) than do other existing methods for regional geomagnetic modeling in the Indonesian archipelago. The smallest RMSE for the declination component (7.03 minutes) is yielded by the collocated cokriging (CC) method and an identical range of 2.048.97 minutes using both KED and OK (Fig. 3a). The KED method yields a slightly higher RMSE of 7.19 minutes. The OK method, using stable and exponential variograms, yields RMSE values of 7.67 and 7.68 minutes, respectively. On the other hand, the polynomial and SCHA methods show an RMSE of 7.96, and 9.26 minutes, and have widerange values of 1.93-9.67, and 2.9311.97 minutes, respectively.
However, for the inclination component, not all geostatistical methods yield small RMSE values. As shown in Fig. 3b, the smallest RMSE is returned using OK, with a stable variogram (6.77 minutes). This is followed by OK with an exponential variogram (6.90 minutes) and has range values of 2.6210.83 minutes. CC has a higher RMSE than the polynomial method, and KED has a higher RMSE than both the polynomial and SCHA methods. The polynomial method has an RMSE of 6.91 minutes with a range of 3.448.55 minutes, the CC has an RMSE of 7.19 minutes and has an identical range with OK, the SCHA has an RMSE of 8.15 minutes with a range of 3.789.63 minutes, and the KED has an RMSE of 10.28 minutes with a range of 6.1112.36 minutes.
As expected, for total intensity the geostatistical method yields smaller RMSE values as shown in Fig. 3c. The smallest RMSE is obtained by the CC method (57.07 nT) with the shortest range of 21.6567.75 nT, followed by OK (63.58 nT) with a range of 25.6078.20 nT. On the other hand, KED (74.54 nT) has a higher RMSE than the polynomial method (68.97 nT), but smaller than SCHA (82.24 nT).

Discussion
The results show that the geostatistical method is a good model of the regional geomagnetic field based on data from typical clustered distribution patterns, as exemplified in Indonesia. The geostatistical model using CC was 2.23 minutes more accurate than the SCHA and polynomial models of the declination component. The CC model also was 25.17 nT more accurate than the SCHA and the polynomial models of the total intensity component. However, for the inclination component, the OK model was more accurate (0.43 minutes smaller) than the polynomial and SCHA models. The dataset in Indonesia consists of the archipelago's clustered data with average distances among the repeat stations of 250 km (Fig. 1). This distance and distribution are not spherical caps enough. The distance among the repeat station points is quite large, as the sea forms a gap among the repeat stations. The large distances between the sample data causes the accuracy of the SCHA to be lower 38,39 because additional statistical or physical regulation are required to compensate for the numerical instability 25 . The same problem occurs with the polynomial method, though typically this method will produce good accuracy if the data distribution is uniform 16 .
The RMSE values of the geomagnetic regional modeling parameters are affected by the spatial distribution of the repeat stations. The same phenomenon was also reported in China 40 , where a randomly distributed pattern produced a noticeable effect. Importantly, the SCHA and polynomial methods do not produce effective models Figure 2. The occurrences of words in the titles, keywords, and abstracts of the 3,524 Scopus-indexed papers using the keywords "geomagnetic" and "regional. " The keyword "geostatistical" has 20 occurrences. However, out of those 20, only 3 papers discuss geomagnetic regional modeling. www.nature.com/scientificreports/ based on clustered distributed patterns. Both these methods require more regularly and densely distributed data 25,41,42 . That the CC and OK methods might better fit a wider range of data distributions was also suggested by Webster and Oliver 26 and Zhao et al. 43 . The CC and OK methods are suitable for random, clustered, and anisotropic distributions 26,43 . They are also able to deal with different weights between clusters using varied numbers of data sets, which can reduce bias in the estimations 32 . Meanwhile, the SCHA and polynomial methods cannot accept a different number of data sets between clusters. The average RMSE of the inclination component from OK is slightly lower than with the polynomial method (0.13 minutes). For the declination and total intensity components whose contours are a curving isoline, the CC method produces a smaller average RMSE than the OK, SCHA, and polynomial methods. This is ideal for the curving isoline of geomagnetic components that results from additional secondary data (IGRF-13). This result was also in agreement with the work of Rivoirard 37 and Han et al. 44 that showed CC could result in a lower RMSE for complex data property. The larger RMSE of the SCHA and polynomial methods probably results from the curving isoline of the geomagnetic declination. Indonesia, which lays in the equator, always has a banding isoline; the declination's contours in Indonesia deviate along the longitude (Fig. 4). That the SCHA and polynomial methods have a larger RMSE with a curving isoline was also found by Gu et al. 45 in China and in Europe by Korte et al. 11 . The polynomial method is more accurate for a straight geomagnetic isoline 46 .
Our results also reveal that when adding more repeat station data sets, the geostatistical method results in better increment accuracy than the SCHA and polynomial methods. The research used the 2015 data set (with 28% additional sample data), compared to the data set from 1990 to 2010 (Fig. 3d). The accuracy of the OK method increased by 47% for the total intensity component. This increment of OK accuracy is 18% higher than that of the SCHA and polynomial methods. The accuracy of OK also increases by 19% for the inclination component, www.nature.com/scientificreports/ which is 9% higher than SCHA and polynomial values. For the declination component, the accuracy of OK also increases by 50%, or 9% higher than SCHA, but a bit lower (3%) than polynomial. However, the accuracy of the CC and KED methods did not increase as significantly as the OK method. The use of a secondary variable 36,37,47,48 in the CC and KED methods may account for the smaller increase. This result also showed that with additional data sets, the SCHA and polynomial methods do not result in an accuracy increase as significant as those seen in previous research from Europe 49,50 , where the data set was regularly distributed.

Methods
We used repeat station data from 1985 to 2015 published by BMKG, and the Indonesian authority granted permission to conduct geomagnetic measurements under the IAGA resolution. Most of the repeat stations are located at the airports of the Indonesian islands. This has the advantage that, in addition to fulfilling the special requirements for the location of the repeat stations, the stations can also support the calibration of runway www.nature.com/scientificreports/ azimuth, which is important for aircraft navigation 51,52 . These measurements were then routinely carried out every five years. The data has been reduced by diurnal and secular variation to get the same time for every period.
To validate the regional geomagnetic model produced by the geostatistical methods (OK, CC, and KED) and the geomagnetic methods (SCHA and polynomial), we used 15% of the total existing repeat station data gathered from 1985 to 2015. The data used for model validation purposes was excluded from the modeling process. The composition and distribution of the validation data were selected randomly (Fig. 1). For the main geomagnetic field, we used the Definitive Geomagnetic Reference Field (DGRF) data generated from IGRF-13, since the global model is more accurate than the regional one 53 .
This research tested the accuracy of the geostatistical method and the existing methods to model the regional geomagnetic field within the Indonesian territory. From the geostatistical method group, we used the OK, CC, and KED methods. The OK equation is as follows 32 : where Z OK is the data estimation result, λ is the weight of each primary data value, and Z(xi) is the data value in the ith location. This method is easily applied and considered the best linear unbiased estimator (BLUE). This method has been used in many fields of earth science [54][55][56][57] and produces reasonable estimates.
The CC equation is as follows 37 : where Z CC is the data estimation result, λ is the weight of the measurement datum in the ith location, Z 1 is the measurement datum in the ith location, μ is the weight of the available complementary datum from IGRF-13, and Z 2 is the available complementary datum from IGRF-13. The KED equation is as follows 36 : where Z KED (x 0 ) is the estimation result in location x 0 , Z 1 (x i ) is the measurement datum from the BMKG in the ith repeat station, λ i is the weight of Z 1 (x i ) and Z 2 (x i ), and a is the slope of the available complementary data from IGRF-13 in location x 0 . The geostatistical method was then compared with the SCHA and polynomial methods. The SCHA method was chosen because it has been used frequently by researchers in recent decades 25 . The potential field equation V in the SCHA method is the following: where V = geomagnetic potency, a = earth's radius (6,371.2 km), r = radial distance of the set location from the Earth's core, θ = colatitude (90°-latitude), λ = longitude, P m n k θ = Legendre function of the associated Schmidt noninteger n k (m) and m, g i,m n k and h i,m n k are the Gaussian Earth's core magnetic field coefficients, g e,m n k and h e,m n k are the Gaussian electric current coefficients.
The polynomial method was also selected for comparison because it is usually more accurate for regional geomagnetic modeling 45,58 . The Taylor polynomial normal field model equation is given below 14,45,58 : For the OK, SCHA, and polynomial methods, the geomagnetic data for each component X, Y, and Z were decremented with the DGRF data to calculate the geomagnetic field anomaly. The short spectrum of the SCHA method was not suitable to model the main geomagnetic field 38 . Mathematically, the procedure was formulated in Eqs. (6), (7), and (8): where ΔX, ΔY, and ΔZ are the geomagnetic field anomaly of each component and DGRF_X, DGRF_Y, and DGRF_Z are the main geomagnetic definitive data obtained from the IGRF-13 model for each component.
The number of sample data sets and locations used for each epoch's calculation is shown in Table 1. Equation (1) was used for the OK method, Eq. (2) was used for the CC method, Eq. (3) was used for the KED method, Eq. (4) was used for the SCHA method, and Eq. (5) was used for the polynomial method with θ 0 = 2° and ϕ 0 = 117.5°. The OK method used stable and exponential variograms. The variogram parameters used in the OK model are range 3.41, sill 0.0023, and nugget 0.0001 for the declination component; range 10.2, sill 0.19, and nugget 0.01 for the inclination component; and range 5.95, sill 31,216, and nugget 1 for total intensity (Fig. 5). The selection of the variogram affects the estimation results of the kriging method 26 . The CC and KED used a spherical variogram. For the SCHA method, an 8th-order truncation was used for the geomagnetic anomaly, because the RMSE of the ΔX, ΔY, and ΔZ components for the 8th order was smaller and stable. For the polynomial method, the 5th order was used because it produced a small RMSE value. We then analyzed the estimation data to determine the root mean square error (RMSE). We then used RMSE to compare the accuracy of the geostatistical method in modeling the regional geomagnetic field with the SCHA and polynomial methods.